Binomial distribution family
If a random variable \(X\) follows bernoulli distribution, we will write \(X \sim \text{Bernoulli}(p)\)
Properties
\[ \begin{align} P ( X = x \mid p ) \; & = \; p^x \, (1 - p)^{1 - x} \\ & = \begin{cases} p \quad & \text{if } x=1 \\ 1 - p \quad & \text{if } x=0 \end{cases} \end{align} \]
Analogy
Tossing a coin for one time. The head or tail follows bernoulli distribution
Suppose we have \(N\) iid trials with success probability \(p\), resulting in random variables \(\mathbf{X} = [X_1, \ldots, X_N]\) such that \[ X_i \sim \text{Bernoulli}(p) . \]
This sequence of iid trials are called Bernoulli trials. \[ \begin{aligned} P_{\mathbf{X}}\left( \mathbf{x} \right) &= \prod_{i=1}^N P_{X_i}\left(x_i\right) \quad (\text{independent}) \\ &= \prod_{i=1}^N p^{x_i} (1- p)^{1 - x_i} \\ &= p^{\sum_i x_i} (1- p)^{N - \sum_i x_i} \end{aligned} \]
Examples
Suppose we are tossing 5 unfair coins with the \(p(\text{head}) = 0.3\) and \(p(\text{tail}) = 0.7\). What is the probability of getting observation [T, T ,T, H, H]
\(p(X=[T, T ,T, H, H])=0.7^3\times 0.7^2\)=0.03087
We define a new r.v. \(\small Y = \sum_{i=1}^N X_i\). Then, for \(\small y \in \{0, 1, \ldots, N\}\), (i.e. Y is the number of head from a fixed length bernoulli trials)
\[ \small \begin{aligned} P_Y \left( y \right) = P \left( \left\{ \mathbf{x} : \sum_i x_i = y \right\} \right) = \binom{N}{y} P_{\mathbf{X}}\left( \mathbf{x} \right) = \binom{N}{y} p^y \, (1 - p)^{N - y} . \end{aligned} \]
Recall from previously, then we have derived the binomial distribution.
| \(y\) | Outcome | Probability | # permutation |
|---|---|---|---|
| 0 | \(\{ 000 \}\) | \((1-p)^3\) | \(\binom{3}{0}\) |
| 1 | \(\{ 100, 010, 001 \}\) | \(p(1-p)^2\) | \(\binom{3}{1}\) |
| 2 | \(\{ 110, 101, 011 \}\) | \(p^2(1-p)\) | \(\binom{3}{2}\) |
| 3 | \(\{ 111 \}\) | \(p^3\) | \(\binom{3}{3}\) |
If a random variable \(X\) has a binomial distribution, we will write \(X \sim B(N,p)\).
Properties
\[ \small \begin{aligned} & P\left( X = x \mid N, p \right) = \binom{N}{x} p^x \, (1 - p)^{N - x} \end{aligned} \]
Analogy
We are tossing an unfair coins for 10 times with \(p(H) = 0.3\). What is the probability of
Consider a populaton with \(N\) diploid individuals. and we are interested in an allele with possibile genotypes \(A\) and \(a\)
Please describing the probability that there will be \(n_{A1}\) copies of \(A\) in the next generation given that there are \(n_{A0}\) copies in the parent generation:
\[ p(n_{A1} \text{in offspring}\mid n_{A0} \text{in parents})=\binom{2N}{n_{A1}}( \frac{n_{A0}}{2N} )^{n_{A1}}( 1-\frac{n_{A0}}{2N} )^{2N-n_{A1}} \]
We define the random variable \(X\) as how many time we need to toss until the r-th success. \(X\) follows negative binomial distribution (\(X\sim NB(r,p)\))
Properties
\[ P\left( X = x \mid r, p \right) \; = \; \binom{x - 1}{r - 1} p^r (1 - p)^{x - r} \]
Analogy
A geometric distribution special case of the negative binomial distribution where \(r = 1\).
A random variable \(X \sim G(p)\) follows geometric distribution has the following property
Properties
\[ P\left( X = x \mid p \right) \; = \; p (1 - p)^{x - 1} \]
Analogy
A mobile game has a 5% draw a SSR card. In average, how many draws do you need to get a SSR card?
The probability that two alleles in generation \(t\) are copies of the same allele in generation \(t − 1\) is \(\frac{1}{2N_e}\) (i.e. probaility to coalesce).
For a coalescent event occur at \(t\) generation ago, it means that two alleles have NOT have colleased in the \(1 \dots t-1\) generation ago.
The probability that two alleles coalesced t generations ago
\[ P(T=t)=(1-\frac{1}{2N_e})^{t-1}(\frac{1}{2N_e}) \]
Tthis time we draws sample without replacement from the finite sample \(N\) with known \(K\) succeess and the random variable \(X\) is defined as exactly \(x\) times of success with \(n\) draws. \(X\sim\text{Hypergeometric}(N,K,n)\)
Properties
\(P( X = x) = \frac{\binom{K}{x}\binom{N-K}{n-x}}{\binom{N}{n}}\)
Analogy
Choosing 5 red balls and 3 blue balls from total 10 red balls and 15 blue balls without replacement
We can view the p-value in Fisher’s exact test as the cdf of a hypergeometric disstribution.
| Tretment | Control | Row sum | |
|---|---|---|---|
| Y | a | b | M |
| Not Y | c | d | N-M |
| Column sum | n | N-n | N |
Fisher’s exact value assumes
the row sum&column sum value fixed
Observation of contingency table follows hypergeometric distribution with the probability:
\[ p(x=a) = \frac{\binom{M}{a}\binom{N-M}{n-a}}{\binom{N}{n}} \]
To estimate the population size of a specific animal in a certain region, the ecologists erform the following procedure (This methods were adopted in counting counting health survey, microbome)
We denoted the number of marked animal as \(X\) and it is obvious that \(X\) follow hypergeometric distribution
In our experiment
so the probability of having \(i\) given the the unknown population parameter \(N\) is
\[ P(X=i | N)\frac{\binom{M}{i}\binom{N-M}{n-i}}{\binom{N}{n}} \]
\[r=\frac{P(X=i | N)}{P(X=i | N-1)}\]
The boundary condition will be
\[ \begin{aligned} (N-m)(N-n) \geq & N(N-m-n+i) \\ \rightarrow & N \leq \frac{mn}{i} \end{aligned} \]
The best estimation of population size the larget integer less than \(\frac{mn}{i}\)